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■ Peng et al. 1 have reported the finding of long-range correlations in DNA sequences, 

in agreement with the results of Li et al. 2 In ref. 1 the DNA sequence was analyzed by 

constructing a random walk in which a pyrimidine represents a step up and a purine a 

pP ; step down. From this walk they calculated the quantity F(l) and verified that the slope 

a of log F(l) versus log / is larger than 0.5 for intron-containing sequences, implying the 

existence of long range correlations. For intron-less sequences they found a = 0.5, which 

is the exponent that characterizes a random sequence or a sequence with short-range 

^ ! correlations. 

Prabhu et al. 3 and Chatzidimitriou-Dreismann et ai. 4 noticed that in most cases 

CN) . log F(l) against log / is not a straight line. They found nonlinear curves both for intron- 
O 



less sequences and those containing introns with a local slope larger than 0.5. According 
to these results, a well-defined fractal power exponent a does not usually exist for DNA 
c$ • sequences. 

We introduce here a simple iterative model of gene evolution, which mimics the ob- 
served behavior of a in real DNA sequences. The model incorporates the basic features 
of DNA evolution, that is, sequence elongation due to gene duplication and mutations. 
In our model we start with an intron-less sequence which consists of N g genes of equal 
length. Each gene has N n nucleotides, which consists pyrimidines and purines randomly 
distributed with the proportion of 50% each. The process of evolution is simulated in the 
following way: We choose at random a gene of our original sequence and in that gene we 
choose at random one of its nucleotides. Then, we change (mutate) this nucleotide from 
a pyrimidine to a purine or vice- versa. A copy of the chosen gene before the mutation is 
added at the end of the chain. This old gene becomes an intron, and it is not modified 
anymore. The genes that can mutate are always the first N g genes, which are the exons 
in our model. We iterate this process for Ni times (Ni >> 1). In the end, our chain will 
consist of "head" of N g exons and a big "tail" of Ni introns. 

Next, we plot logF(/) versus log/, and as in real DNA sequences we do not find a 
straight line. The local slope a(l) of this curve is shown in the figure below. The parameter 
values are N g = 10, N n = 3 and Ni = 10 5 , 5 x 10 5 , 10 6 , which are represented by a solid, 
dotted and dashed line, respectively. When compared with Fig. 8 of ref. 5 we observe 
that, for / <^ N g N n (the length of our starting sequence), our results are in excellent 
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agreement with the ones obtained from the real DNA sequence. The exponent a increases 
monotonically from 0.5 to approximately 0.79 then decreases again to 0.5 in the limit of 
very large /. We have studied a large region of the parameter space and have always found 
the same asymptotic behavior for a. 

In real DNA sequences, however, introns and exons are interdispersed. For that reason, 
we also studied chains of the above model in which we perform a random shuffling of exons 
and introns. The only difference we noticed for this case was that the maximum value a 
reaches now is about 0.62. 

We also saw that if we iterate our model by making copies of genes without any 
mutation we find that the value of a is constant and equal to 0.5. This shows that the 
increase of a is due to the mutations. 

We conclude from our simple DNA growth model that mutations have the following 
effect on the correlation F(l) of the sequence: For chains of intermediate size, like the ones 
analized in refs. 1-5, one finds an increase in the exponent a. This is, however, a transient 
behavior since when chains hundred times longer (10 6 ) are analyzed, a is back to 0.5, i.e., 
the value of a random walk. We therefore conclude that the interesting discovery of refs. 
1 and 2 is probably the fingerprint of the mutations that occurred during the evolution 
of the DNA strand, but only a transient behaviour for not too long chains and not a real 
asymptotic long-range correlation. 
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Figure - Local slope ct(l) of log F(l) versus log I for N g = 10, N n = 3 and N t = 10 5 , 5 x 
10 5 , 10 6 , represented by a solid, dotted and dashed line, respectively, for an ensemble 
average of 40 realizations. 
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